This analysis applies the Bioconductor package fastseg to segment chromosomes based on numeric variable, such as DNA copy number and fold change of RNA transcription. Please refer to package manual for full package description. In summary, the fastseg package implements a fast and efficient segmentation algorithm, which is based on the cyber t-test (Baldi and Long, 2001). Segments identified by the algorithm are then summarized and compared to segments derived from randomized data, in terms of their frequency, length, size, and mean of the numeric variable (copy number, fold change, etc.).
Â
Genetic Modifiers in Trisomy 21 Leukemogenesis.
GEO public data set. RPKM and log2FC values are download from GSE55504.
Log2-fold change between one pair of monozygotic twins (T2N_Rep0 vs. T1DS_Rep0). Goal is to regenerate figure 1a in the original paper.
Parameters:
Variable name: log_rpkm_T2N_Rep0
Table 1. Brief summary of inputs and outputs.
| Description | Value |
|---|---|
| Total number of loci | 13130 |
| Total number chromosomes | 23 |
| Range of values | -2.69857 to 3.942649 (mean=-0.04061339) |
| Number of segments | 83 |
| Length of segments | 6 to 939 (mean=158.1928) |
| Size of segments | 6 to 939 (mean=158.1928) |
| Mean of segments | -1.457923 to 3.446533 (mean=0.1998589) |
Figure 1. Global view of segmentation across all chromosomes (in alternative colors). Red lines indicate segment locations. Click here to download figures by individual chromosomes.
Figure 2. Distribution of log_rpkm_T2N_Rep0: original values at all individual loci vs. segment means.
Selection of significant segments using given criteria.
Table 2. Summary of selected segments: location, length, size, and log_rpkm_T2N_Rep0 at individual loci. Click links to get full list of loci within each segment and visualization of segmentation via Manhattan plot.
| chromosome | start | end | length | size | mean | minimum | maximum | variance | loci | segmentation | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| segment_1 | chr1 | 11869 | 1297157 | 1285289 | 42 | 1.8237 | -2.9879 | 7.7200 | 2.8884 | table | figure |
| segment_2 | chr1 | 1309110 | 151138424 | 149829315 | 939 | 0.1195 | -3.3195 | 9.7265 | 2.3538 | table | figure |
| segment_3 | chr1 | 151138498 | 161332984 | 10194487 | 167 | 1.6856 | -3.2377 | 9.3922 | 2.4026 | table | figure |
| segment_4 | chr1 | 161334521 | 249231242 | 87896722 | 388 | -0.4992 | -3.3104 | 7.4969 | 2.1171 | table | figure |
| segment_5 | chr2 | 217730 | 73302747 | 73085018 | 277 | -0.2305 | -3.3173 | 9.4369 | 2.5590 | table | figure |
| segment_6 | chr2 | 73300510 | 85824736 | 12524227 | 57 | 1.2180 | -3.1117 | 10.6000 | 2.6672 | table | figure |
| segment_7 | chr2 | 85825671 | 217071026 | 131245356 | 420 | -0.6368 | -3.3177 | 10.5159 | 2.1969 | table | figure |
| segment_8 | chr2 | 217122588 | 223809357 | 6686770 | 56 | 1.0739 | -2.9800 | 8.5067 | 2.5086 | table | figure |
| segment_9 | chr2 | 223782105 | 239077541 | 15295437 | 55 | -0.4999 | -3.0342 | 6.8650 | 2.1584 | table | figure |
| segment_10 | chr2 | 239072633 | 242708231 | 3635599 | 38 | 0.7064 | -2.9968 | 5.5398 | 2.1500 | table | figure |
| segment_11 | chr3 | 3168600 | 48542259 | 45373660 | 211 | -0.3001 | -3.3113 | 7.6221 | 2.2196 | table | figure |
| segment_12 | chr3 | 48555117 | 52740048 | 4184932 | 106 | 1.3100 | -3.2285 | 8.0900 | 2.6327 | table | figure |
| segment_13 | chr3 | 52738971 | 197766105 | 145027135 | 431 | -0.3900 | -3.3180 | 9.3609 | 2.3013 | table | figure |
| segment_14 | chr4 | 53179 | 8442450 | 8389272 | 67 | 0.8188 | -3.0207 | 6.4850 | 2.1142 | table | figure |
| segment_15 | chr4 | 8437867 | 190884359 | 182446493 | 294 | -0.8525 | -3.3022 | 8.8853 | 2.1151 | table | figure |
| segment_16 | chr5 | 140373 | 172462448 | 172322076 | 458 | -0.2548 | -3.3203 | 8.9192 | 2.3591 | table | figure |
| segment_17 | chr5 | 172571445 | 178510538 | 5939094 | 44 | 1.5140 | -3.2982 | 7.0060 | 2.9276 | table | figure |
| segment_18 | chr5 | 178537852 | 180699168 | 2161317 | 21 | 0.7783 | -2.7938 | 7.1790 | 2.9103 | table | figure |
| segment_19 | chr6 | 142272 | 30032686 | 29890415 | 164 | -0.6928 | -3.3009 | 9.4131 | 2.2918 | table | figure |
| segment_20 | chr6 | 30034486 | 34712450 | 4677965 | 132 | 1.8960 | -3.1708 | 10.4208 | 2.6767 | table | figure |
| segment_21 | chr6 | 34725183 | 170893780 | 136168598 | 397 | -0.2978 | -3.3127 | 10.0683 | 2.5031 | table | figure |
| segment_22 | chr7 | 182935 | 75115548 | 74932614 | 314 | -0.1027 | -3.3166 | 10.3223 | 2.4028 | table | figure |
| segment_23 | chr7 | 75123401 | 97528427 | 22405027 | 58 | -0.8639 | -3.3109 | 10.4613 | 2.9438 | table | figure |
| segment_24 | chr7 | 97576299 | 99753567 | 2177269 | 40 | 0.7271 | -3.0820 | 5.9881 | 2.3753 | table | figure |
| segment_25 | chr7 | 99752043 | 102184211 | 2432169 | 55 | 1.2903 | -2.9637 | 9.7112 | 2.9970 | table | figure |
| segment_26 | chr7 | 102178365 | 158622944 | 56444580 | 229 | -0.0874 | -3.2913 | 9.8935 | 2.5625 | table | figure |
| segment_27 | chr8 | 163251 | 144380231 | 144216981 | 318 | -0.5191 | -3.3051 | 7.7725 | 2.1373 | table | figure |
| segment_28 | chr8 | 144386554 | 145583036 | 1196483 | 41 | 2.0250 | -2.4528 | 7.4860 | 2.9499 | table | figure |
| segment_29 | chr8 | 145577795 | 146281416 | 703622 | 27 | 1.2915 | -3.0405 | 9.4505 | 3.1229 | table | figure |
| segment_30 | chr9 | 14511 | 130210909 | 130196399 | 403 | -0.4408 | -3.3189 | 9.7481 | 2.3412 | table | figure |
| segment_31 | chr9 | 130209953 | 132484875 | 2274923 | 70 | 1.8107 | -3.2933 | 6.0126 | 2.0579 | table | figure |
| segment_32 | chr9 | 132500610 | 136522435 | 4021826 | 51 | 0.7512 | -3.3005 | 10.5682 | 2.7385 | table | figure |
| segment_33 | chr9 | 136528682 | 140764468 | 4235787 | 75 | 1.7690 | -3.3116 | 9.4640 | 2.7151 | table | figure |
| segment_34 | chr10 | 180405 | 74114988 | 73934584 | 222 | -0.6039 | -3.2854 | 11.3180 | 2.1665 | table | figure |
| segment_35 | chr10 | 74127098 | 79689582 | 5562485 | 49 | 0.2808 | -3.2488 | 4.0006 | 2.1597 | table | figure |
| segment_36 | chr10 | 79729008 | 97416463 | 17687456 | 75 | -0.4830 | -3.1955 | 6.8894 | 2.2848 | table | figure |
| segment_37 | chr10 | 97423153 | 104498951 | 7075799 | 77 | 0.7405 | -3.2661 | 10.7612 | 2.5309 | table | figure |
| segment_38 | chr10 | 104503727 | 135516024 | 31012298 | 123 | -0.9591 | -3.2428 | 5.9188 | 1.9194 | table | figure |
| segment_39 | chr11 | 127115 | 64139687 | 64012573 | 339 | 1.1132 | -3.2791 | 9.5353 | 2.7856 | table | figure |
| segment_40 | chr11 | 64532078 | 66104311 | 1572234 | 61 | 3.6046 | -2.3831 | 10.1488 | 2.5706 | table | figure |
| segment_41 | chr11 | 66104804 | 134135749 | 68030946 | 289 | 0.3603 | -3.2895 | 10.7152 | 2.6900 | table | figure |
| segment_42 | chr12 | 73725 | 52585784 | 52512060 | 231 | 0.0779 | -3.2991 | 11.5443 | 2.6470 | table | figure |
| segment_43 | chr12 | 52626304 | 58019934 | 5393631 | 99 | 1.8729 | -2.7807 | 8.9971 | 2.4733 | table | figure |
| segment_44 | chr12 | 58017193 | 133684130 | 75666938 | 302 | 0.0833 | -3.3154 | 10.4853 | 2.4093 | table | figure |
| segment_45 | chr13 | 19271143 | 103528345 | 84257203 | 172 | -0.9196 | -3.3218 | 7.0206 | 2.1967 | table | figure |
| segment_46 | chr13 | 107028911 | 111373421 | 4344511 | 13 | 0.8679 | -2.5207 | 6.2899 | 2.6676 | table | figure |
| segment_47 | chr13 | 111530887 | 114116670 | 2585784 | 11 | -0.5090 | -3.1502 | 4.7364 | 2.4184 | table | figure |
| segment_48 | chr13 | 114110134 | 115092796 | 982663 | 10 | 0.9738 | -2.1502 | 5.8303 | 2.6760 | table | figure |
| segment_49 | chr14 | 20724717 | 21945132 | 1220416 | 17 | 0.9554 | -1.5697 | 5.5552 | 1.8794 | table | figure |
| segment_50 | chr14 | 21944756 | 24910540 | 2965785 | 64 | 1.6351 | -3.2699 | 6.9609 | 2.1987 | table | figure |
| segment_51 | chr14 | 24908972 | 103049514 | 78140543 | 264 | -0.4405 | -3.3166 | 9.4866 | 2.2408 | table | figure |
| segment_52 | chr14 | 103058998 | 106445233 | 3386236 | 38 | 1.2308 | -2.8784 | 6.0088 | 2.1180 | table | figure |
| segment_53 | chr15 | 20587869 | 72672051 | 52084183 | 235 | -0.2409 | -3.2961 | 10.1781 | 2.6687 | table | figure |
| segment_54 | chr15 | 72766667 | 76005189 | 3238523 | 35 | 1.0646 | -3.2145 | 6.0009 | 2.3475 | table | figure |
| segment_55 | chr15 | 76135622 | 85682376 | 9546755 | 50 | -0.3636 | -3.2050 | 6.7080 | 2.1414 | table | figure |
| segment_56 | chr15 | 85923802 | 91506349 | 5582548 | 33 | 1.1943 | -3.1240 | 7.5430 | 2.4806 | table | figure |
| segment_57 | chr15 | 91509270 | 102516768 | 11007499 | 19 | -0.9622 | -3.3060 | 3.2547 | 1.8334 | table | figure |
| segment_58 | chr16 | 64043 | 70285833 | 70221791 | 477 | 0.9841 | -3.3210 | 9.2451 | 2.4721 | table | figure |
| segment_59 | chr16 | 70286293 | 84220669 | 13934377 | 60 | -0.4343 | -3.3201 | 3.9921 | 1.8463 | table | figure |
| segment_60 | chr16 | 84511681 | 90114181 | 5602501 | 63 | 1.2604 | -3.0606 | 8.7869 | 2.5420 | table | figure |
| segment_61 | chr17 | 254326 | 78451643 | 78197318 | 763 | 0.8129 | -3.3143 | 12.2915 | 2.6102 | table | figure |
| segment_62 | chr17 | 78518619 | 81052864 | 2534246 | 58 | 2.4043 | -2.5788 | 11.4107 | 2.8363 | table | figure |
| segment_63 | chr18 | 158383 | 45457515 | 45299133 | 79 | -0.7207 | -3.2021 | 7.7021 | 2.2806 | table | figure |
| segment_64 | chr18 | 46065417 | 47920543 | 1855127 | 8 | 0.9350 | -1.2943 | 3.7130 | 1.7185 | table | figure |
| segment_65 | chr18 | 48405419 | 77905406 | 29499988 | 44 | -0.9610 | -3.2691 | 5.2022 | 1.6438 | table | figure |
| segment_66 | chr19 | 197124 | 19312678 | 19115555 | 396 | 1.6494 | -3.2818 | 9.8794 | 2.5766 | table | figure |
| segment_67 | chr19 | 19312218 | 36036218 | 16724001 | 59 | -0.1551 | -3.2600 | 4.4556 | 2.0650 | table | figure |
| segment_68 | chr19 | 36031640 | 50980010 | 14948371 | 277 | 1.5884 | -3.2356 | 11.0569 | 2.4361 | table | figure |
| segment_69 | chr19 | 50979657 | 54635140 | 3655484 | 46 | -0.6264 | -3.2650 | 7.0602 | 3.0636 | table | figure |
| segment_70 | chr19 | 54641444 | 56729146 | 2087703 | 42 | 1.6853 | -2.8288 | 6.4297 | 2.5702 | table | figure |
| segment_71 | chr19 | 56879468 | 59111168 | 2231701 | 59 | -0.5435 | -2.7289 | 7.9051 | 2.2058 | table | figure |
| segment_72 | chr20 | 251504 | 4040760 | 3789257 | 56 | 0.9768 | -3.2685 | 8.9238 | 2.6908 | table | figure |
| segment_73 | chr20 | 4101627 | 19804587 | 15702961 | 46 | -0.7004 | -3.1188 | 3.6852 | 1.6596 | table | figure |
| segment_74 | chr20 | 19867165 | 62731996 | 42864832 | 305 | 0.5528 | -3.2053 | 10.9322 | 2.3615 | table | figure |
| segment_75 | chr21 | 11180920 | 45182188 | 34001269 | 100 | -0.4948 | -3.2444 | 9.7711 | 2.1644 | table | figure |
| segment_76 | chr21 | 45192393 | 46221934 | 1029542 | 14 | 1.2117 | -3.3131 | 5.1460 | 2.2903 | table | figure |
| segment_77 | chr21 | 46225532 | 46646478 | 420947 | 6 | 0.8318 | -1.6462 | 3.4426 | 2.1148 | table | figure |
| segment_78 | chr21 | 46683843 | 47679304 | 995462 | 14 | 2.5220 | -2.2295 | 9.1911 | 3.1769 | table | figure |
| segment_79 | chr21 | 47655047 | 48111157 | 456111 | 8 | -0.3383 | -2.6454 | 1.8682 | 1.4941 | table | figure |
| segment_80 | chr22 | 16122720 | 51239737 | 35117018 | 407 | 0.4632 | -3.2973 | 10.3852 | 2.4520 | table | figure |
| segment_81 | chrX | 220013 | 152865500 | 152645488 | 421 | 0.1594 | -3.2978 | 10.8871 | 2.5949 | table | figure |
| segment_82 | chrX | 152869952 | 153719016 | 849065 | 32 | 2.6375 | -3.2218 | 10.1131 | 3.0795 | table | figure |
| segment_83 | chrX | 153733350 | 154688276 | 954927 | 17 | 0.0335 | -3.2665 | 5.8921 | 2.6093 | table | figure |
Repetitively use the same criteria to identify and select segments from 10 sets of randomized data and compare the summary statistics of selected segments.
Table 3. Means of summary statistics of segments identified and selected original data vs. multiple sets of randomized data: number of loci, segment length, mean and standard deviation of log_rpkm_T2N_Rep0 of segments. If mean log_rpkm_T2N_Rep0 of selected segments can be both positive and negative, their absolute values are used in this table.
| size | length | mean | variance | |
|---|---|---|---|---|
| original | 158.1928 | 35301623 | 0.92 | 2.4152 |
| random_1 | 59.1441 | 13102549 | 0.68 | 0.8142 |
| random_2 | 57.5877 | 12668005 | 0.68 | 0.8420 |
| random_3 | 64.0488 | 14184075 | 0.70 | 0.8490 |
| random_4 | 61.6432 | 13616218 | 0.70 | 0.8017 |
| random_5 | 76.7836 | 17051403 | 0.68 | 0.8363 |
| random_6 | 65.3234 | 14364656 | 0.67 | 0.8387 |
| random_7 | 58.0973 | 12853151 | 0.69 | 0.8366 |
| random_8 | 58.3556 | 12899710 | 0.75 | 0.8600 |
| random_9 | 64.0488 | 14032478 | 0.71 | 0.8151 |
| random_10 | 61.3551 | 13552694 | 0.68 | 0.8291 |
Figure 3. Relationship between segment size and segment mean log_rpkm_T2N_Rep0. Each dot represents a segment derived from the original real data (blue) and randomized data (grey).
Figure 4. Distribution of segment size compared between original and randomized data.
Figure 5. Distribution of segment length compared between original and randomized data.
Figure 6. Distribution of log_rpkm_T2N_Rep0 mean of segments compared between original and randomized data.
Figure 7. Distribution of log_rpkm_T2N_Rep0 standard deviation of segments compared between original and randomized data.
Check out the RoCA home page for more information.
To reproduce this report:
Find the data analysis template you want to use and an example of its pairing YAML file here and download the YAML example to your working directory
To generate a new report using your own input data and parameter, edit the following items in the YAML file:
Run the code below within R Console or RStudio, preferablly with a new R session:
if (!require(devtools)) { install.packages('devtools'); require(devtools); }
if (!require(RCurl)) { install.packages('RCurl'); require(RCurl); }
if (!require(RoCA)) { install_github('zhezhangsh/RoCAR'); require(RoCA); }
CreateReport(filename.yaml); # filename.yaml is the YAML file you just downloaded and edited
If there is no complaint, go to the output folder and open the index.html file to view report.
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] xlsx_0.6.1 vioplot_0.3.0 zoo_1.8-4
## [4] sm_2.2-5.6 fastseg_1.28.0 Biobase_2.42.0
## [7] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1 IRanges_2.16.0
## [10] S4Vectors_0.20.1 BiocGenerics_0.28.0 DEGandMore_0.0.0.9000
## [13] snow_0.4-3 htmlwidgets_1.5.1 DT_0.15
## [16] kableExtra_0.9.0 awsomics_0.0.0.9000 yaml_2.2.1
## [19] rmarkdown_1.10 knitr_1.20 RoCA_0.0.0.9000
## [22] RCurl_1.95-4.11 bitops_1.0-6 devtools_2.3.1
## [25] usethis_1.6.1
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.2 pkgload_1.0.2 jsonlite_1.6.1
## [4] viridisLite_0.3.0 assertthat_0.2.1 highr_0.7
## [7] xlsxjars_0.6.1 GenomeInfoDbData_1.2.0 remotes_2.2.0
## [10] sessioninfo_1.1.1 pillar_1.4.6 backports_1.1.5
## [13] lattice_0.20-38 glue_1.3.2 digest_0.6.25
## [16] XVector_0.22.0 rvest_0.3.2 colorspace_1.4-1
## [19] htmltools_0.4.0 pkgconfig_2.0.3 zlibbioc_1.28.0
## [22] scales_1.1.1 processx_3.4.2 tibble_2.1.3
## [25] ellipsis_0.3.0 withr_2.2.0 cli_2.0.2
## [28] magrittr_1.5 crayon_1.3.4 memoise_1.1.0
## [31] evaluate_0.14 ps_1.3.2 fs_1.3.2
## [34] fansi_0.4.1 xml2_1.2.0 pkgbuild_1.1.0
## [37] tools_3.5.1 prettyunits_1.1.1 hms_0.4.2
## [40] lifecycle_0.2.0 stringr_1.3.1 munsell_0.5.0
## [43] callr_3.4.3 compiler_3.5.1 rlang_0.4.5
## [46] grid_3.5.1 rstudioapi_0.11 crosstalk_1.1.0.1
## [49] tcltk_3.5.1 testthat_2.3.2 R6_2.4.1
## [52] rprojroot_1.3-2 readr_1.3.1 desc_1.2.0
## [55] rJava_0.9-11 stringi_1.2.4 Rcpp_1.0.5
END OF DOCUMENT